class: center, middle, inverse, title-slide .title[ # Introduction to Data Science ] .subtitle[ ## Week 1: Getting Started ] .author[ ### Ugur Aytun ] .institute[ ### METU, Department of Economics | ECON 413 ] --- # What this course is about -- - This course mainly aims to provide you complementary skills to standard econometrics course. -- - This involves the data cleaning, wrangling, data visualization, and data analysis (regression, machine learning etc). --- # What is data? -- - Data is a collection of facts, such as numbers, words, measurements, observations or just descriptions of things. -- - Data can be qualitative (text, maps, photographs, social media (X, Instagram, Spotify)..) or quantitative (economic variables (prices, wages, trade, output, consumption etc.)). -- - Qualitative data can be analyzed in multiple ways. One common method is data coding, which refers to the process of transforming the raw collected data into a set of meaningful categories that describe essential concepts of the data. -- - Quantitative data can be measured in different ways: e-commerce data (IKEA, hepsiburada..). This challenges the traditional metrics such as CPI, GDP because they are not able to capture the economic activities adequately. --- .h-400px[  ] --- # Flight data .h-400px[  ] --- # Trade network .h-400px[  ] --- # Trade network .h-400px[  --- # Satelite data .h-400px[  ] --- # Why is the data science relevant in recent years? -- - The amount of data generated by humans is increasing exponentially. This is due to the digitalization of the economy and the society. -- - The data can be used to predict the future events, to understand the past events, and to make better decisions. --- # What is data science? -- - Data science is an interdisciplinary field that uses scientific methods, processes, algorithms and systems to extract knowledge and insights from structured and unstructured data. -- .h-400px[  ] --- # What is R? -- - R is a programming language and free software environment for statistical computing and graphics supported by the R Foundation for Statistical Computing. -- - Not just a statistical package but a programming language that can be used for data manipulation, visualization, and analysis and reporting. -- - Open source and free to use. This leads to a large community of users and developers. You can find and use most updated statistical methods and packages. -- - You can also join the discussion groups and ask questions about your problems, contributing to the vibrant community. --- # Why R? -- - R is a powerful tool for data analysis and visualization. It is widely used in academia, industry, and government. -- - R talks to many data sources: Excel, SPSS, SAS, Stata, SQL, Google Sheets, and many more. --- # What is RStudio? -- - RStudio is an editor that makes it easier to write R code. It is a powerful tool that helps you to write, run, and debug R code. -- - You may consider the RStudio as other editors i.e. Overleaf, Sublime Text, Atom, or Jupyter Notebook. Most of R users prefer RStudio because it is user-friendly and has many features that make your life easier. --- # R Screen .h-400px[  ] --- # RStudio screen .h-400px[  ] --- # RStudio screen -- - The left panel is the script editor where you write your code. -- - The right panel is the console where you run your code. -- - The bottom panel is the environment (one of the most advantageous part of the R) where you can see the objects you created and the history of your commands. -- - The top panel is the menu where you can find the tools and options. -- - You can also see the packages you installed and the help files of the functions you use. -- - You can also see the plots you created.